Exploring weak scalability for FEM calculations on a GPU-enhanced cluster

نویسندگان

  • Dominik Göddeke
  • Robert Strzodka
  • Jamaludin Mohd-Yusof
  • Patrick S. McCormick
  • Sven H. M. Buijssen
  • Matthias Grajewski
  • Stefan Turek
چکیده

The first part of this paper surveys co-processor approaches for commodity based clusters in general, not only with respect to raw performance, but also in view of their system integration and power consumption. We then extend previous work on a small GPU cluster by exploring the heterogeneous hardware approach for a large-scale system with up to 160 nodes. Starting with a conventional commodity based cluster we leverage the high bandwidth of graphics processing units (GPUs) to increase the overall system bandwidth that is the decisive performance factor in this scenario. Thus, even the addition of low-end, out of date GPUs leads to improvements in both performanceand power-related metrics.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A scalable hybrid algorithm based on domain decomposition and algebraic multigrid for solving partial differential equations on a cluster of CPU/GPUs

Several of the top ranked supercomputers are based on the hybrid architecture consisting of a large number of CPUs and GPUs. Very high performance has been obtained for problems with special structures, such as FFT-based image processing or N-body based particle calculations. However, for the class of problems described by partial differential equations discretized by finite difference (or othe...

متن کامل

OPTIMAL SOLUTION OF RICHARDS’ EQUATION FOR SLOPE INSTABILITY ANALYSIS USING AN INTEGRATED ENHANCED VERSION OF BLACK HOLE MECHANICS INTO THE FEM

One of the most crucial problems in geo-engineering is the instability of unsaturated slopes, causing severe loss of life and property worldwide. In this study, five novel meta-heuristic methods are employed to optimize locating the Critical Failure Surface (CFS) and corresponding Factor of Safety (FOS). A Finite Element Method (FEM) code is incorporated to convert the strong form of the Richar...

متن کامل

General-purpose molecular dynamics simulations on GPU-based clusters

We present a GPU implementation of LAMMPS, a widely-used parallel molecular dynamics (MD) software package, and show 5x to 13x single node speedups versus the CPU-only version of LAMMPS. This new CUDA package for LAMMPS also enables multi-GPU simulation on hybrid heterogeneous clusters, using MPI for inter-node communication, CUDA kernels on the GPU for all methods working with particle data, a...

متن کامل

Scalable Breadth-First Search on a GPU Cluster

On a GPU cluster, the ratio of high computing power to communication bandwidth makes scaling breadthfirst search (BFS) on a scale-free graph extremely challenging. By separating high and low out-degree vertices, we present an implementation with scalable computation and a model for scalable communication for BFS and direction-optimized BFS. Our communication model uses global reduction for high...

متن کامل

Scalability of parallel finite element algorithms on multi-core platforms

The speedup of element-by-element FEM algorithms depends not only on peak processor performance but also on access time to shared mesh data. Eliminating memory boundness would significantly speed up unstructured mesh computations on hybrid multi-core architectures, where the gap between processor and memory performance continues to grow. The speedup can be achieved by ordering unknowns so that ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Parallel Computing

دوره 33  شماره 

صفحات  -

تاریخ انتشار 2007